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Summary The application of DNA-based markers toward the task of discriminating among alternate 

salmon runs has evolved in accordance with ongoing genomic developments and 
increasingly has enabled resolution of which genetic markers associate with important 
life-history differences. Accurate and efficient identification of the most likely origin for 
salmon encountered during ocean fisheries, or at salvage from fresh water diversion and 
monitoring facilities, has far-reaching consequences for improving measures for manage- 
ment, restoration and conservation. Near-real-time provision of high-resolution identity 
information enables prompt response to changes in encounter rates. We thus continue to 
develop new tools to provide the greatest statistical power for run identification. As a proof 
of concept for genetic identification improvements, we conducted simulation and blind tests 
for 623 known-origin Chinook saknon {Oncorhynchus tshawytscha) to compare and contrast 
the accuracy of different population sampUng baselines and microsatellite loci panels. This 
test included 35 microsatellite loci (1266 alleles), some known to be associated with specific 
coding regions of functional significance, such as the circadian rhythm cryptochrome 
genes, and others not known to be associated with any functional importance. The 
identification of fall run with unprecedented accuracy was demonstrated. Overall, the top 
performing panel and baseline (HMSC21) were predicted to have a success rate of 98%, but 
the blind-test success rate was 84%. Findings for bias or non-bias are discussed to target 
primary areas for further research and resolution. 
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Introduction 

Salmon are prized globally as a source of high-quaUty food. 
Chinook or King salmon {Oncorhynchus tshawytscha) tradi- 
tionally has ranked as the most favored salmon species 
owing to its firm quality and high-nutrient flesh. Indeed, 
Chinook salmon was ranked among the top five of 60 
wildlife species in an economic valuation of biodiversity 
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(along with elk, moose, humpback whale and bald eagle; 
Martin-Lopez et al. 2008). The natural distribution of 
Chinook extends from Hokkaido Island (Northern Japan) 
up northerly through Kamchatka, Russia, the Bering Sea, 
Alaska, to ocean territories west of Canada, Washington, 
Oregon and California. Today, this species also is spawned 
and reared in a substantial number of hatcheries distributed 
across this range and in aquaculture enterprises of Chile, 
Brazil, Korea and New Zealand, where some naturalized 
populations have become established. 

At the southeastern extreme of Chinook's natural distri- 
bution, California's Central Valley drainage surfaces as a 
unique context for this species. Broad availability of 
extensive habitat combined with consistent cold watering 
from Sierra snowmelt here has supported development of 



© 2014 The Authors. Animal Genetics published by John Wiley & Sons Ltd 
on behalf of Stichting International Foundation for Animal Genetics., 45, 412-420 
This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and 
distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made. 



Testing advances in molecular discrimination 413 



the most diverse rtinge in life-history types found anywhere. 
Thus, there tire four primary runs, named fall, late-fall, 
winter and spring, after seasonal peaks in numbers of 
freshwater returns from the ocean (Fisher 1994). Although 
there is overlap across seasons and essentially gravid 
Chinook may be found in the river year round, historically 
the runs occupied spatially segregated spawning habitats. 
Winter run utilized spring-fed headwaters, spring run 
utilized higher elevation streams, late-fall run utilized 
mainstem rivers and fall run utilized lower elevation rivers 
and tributaries (Yoshiyama et al. 2001). Today, however, 
approximately 70% of previously available habitats are now 
impounded by reservoirs or for other uses, raising questions 
as to how effectively these runs may be able to maintain 
reproductively isolated breeding groups. 

These four runs also often occur together during other 
phases of the Chinook's life cycle, for example as juvenile out- 
migrants through the Sacramento/San Joaquin Delta and 
San Francisco estuary or during ocean-feeding migration. As 
migrants through the Delta, juvenile Chinook are exposed to 
large water export facilities operated by the State of California 
(State Water Project) and the U.S. Government (Central 
Valley Project). Some of these salmon subpopulations are 
listed as endangered (winter run) or threatened (spring run), 
thus there has been active interest to develop relitible methods 
for identification of run among sampled fish. This motivated 
early development of molecular and statistical tools for 
individual assignment, and Central Valley Chinook salmon 
were among the first salmonids to be individually assigned to 
run using molecular genetics (Banks et al. 1999, 2000). It 
now has been over a decade since that baseline was 
published, and a central goal of our effort has been to develop 
and upgrade methodologies in order to provide the highest 
resolution for individual (not population )-based discriminat- 
ing among these four runs of Central Valley Chinook salmon. 
Two primary approaches were addressed: (i) We sought 
markers directly linked to life-history traits differing among 
the runs (such as run timing; O'Malley et al. 2007) and (ii) we 
employed statistical approaches to assess the relative power of 
alternate makers for run discrimination (Banks et al. 2003). 
Research presented here focused on the improvements of 
molecular genetics to discriminate among Chinook salmon of 
California's Central Valley. Three different microsatellite loci 
panels were contrasted between two different baseline 
collections of Chinook salmon. 

Methods 

Baselines, subpopulation assemblages, sample collection 
and DNA extraction 

This study compared and contrasted two baseline population 
genetic characterizations of Chinook salmon sampled from 
California's Central Valley drainage (Fig. 1). hereafter called 
baselines, and three different microsateUite loci panels. The 



first baseline collection, the Hatfield Marine Science Center 
(HMSC) baseline, founded on Btmks et al. (2000), included 
samples that were divided among five reporting groups. 
Three of the reporting groups corresponded to primary runs 
(winter, fall and late-fall), and the other two corresponded to 
genetically distinct assemblages of spring run: (i) spring run 
from Butte Creek and (ii) spring run from Deer and Mill 
Creeks. These samples were assembled among ten 96-well 
trays (two for each primary run or reporting group) and 
included a total of 936 samples: comprising between six and 
86 samples for each of nine years and 24 run collections 
taken from 1991 to 1998 by the California Department of 
Fish and Game (CDFG) and the U.S. Fish and Wildlife Service 
(Table 1). The second baseline collection, the Genetic Analy- 
sis of Pacific Salmon (GAPS) Consortium baseline, was 
developed and standardized among 12 fisheries genetics 
laboratories in the Pacific Northwest (Seeb et al. 2007: 
Moran etal. 2013) and included a total of nine discrete 
population samples from California's Central Valley drainage 
among a total of 166 population samples distributed from 
California to Alaska. These baseline collections were divided 
among four reporting groups (the five described in Banks et al. 
2000 and depicted in Table 1, except late-fall). To compare 
assignment accuracy of these baselines, it was necessary to 
use common reporting groups. Because the GAPS baseline did 
not characterize any late-fall collections from California, fall 
and late-fall results derived using the HMSC baseline in the 
present study were pooled into a single fall-late-fall reporting 
group. This pooled fall-late-fall reporting group derived from 
GAPS and HMSC baselines also included assignments to both 
spring and fall individuals from the Feather River Hatchery 
owing to known hybridization between these stocks and 
difficulty in resolving population identity between them 
(Banks et al. 2000: Hedgecock et al. 2001). 

Although 100%, jackknife and leave-one-out simulations 
available in population assignment applictitions may be 
useful for predicting the accuracy tmd precision provided by 
various genetic baselines, they also may provide biased or 
overly optimistic indications. It is thus ideal to include 
samples of known origin or 'blind samples' when evaluating 
assignment power. For this purpose, a total of 750 tissue 
samples from Chinook salmon of known life history stored 
in the CDFG tissue archive were coded (to mask their 
identity) and enabled a blind test of assignment accuracy of 
three alternate microsatellite panels. DNA extraction of 
blind-test samples followed a siUca-based method utilizing 
multichannel pipettes: PALL glass fiber filtration plates; and 
buffer, centrifuge and transfer protocols described in 
Ivanova et al. (2006). 

Microsatellite loci characterization 

Baseline and blind-test samples were characterized utilizing 
three microsatellite panels, and following amplification 
protocols detailed in references cited; 



© 2014 The Authors. Animal Genetics published by John Wiley & Sons Ltd 

on behalf of Stichting International Foundation for Animal Genetics., 45, 412-420 



414 Banks et al. 



hatcheries 
^ winter 
X late-fall 
4 fall 
■ spring 



Redding 
Coleman NFH 



. Mt. Shasta 




PACIFIC OCEAN 



121° 



-120° 



Figure 1 Rivers and tributaries of California's Central Valley indicating Chinook salmon sampling sites per run and hatcheries. 



1 GAPS13 (from Seeb et al. 2007) included: Ogo-2, -4 
(Olsen etal. 1998): OkilOO (Canadian Department of 
Fisheries and Oceans, unpublished): OmmlOSO (Rex- 
road et al. 2001); Ots-3M (Greig & Banks 1999): Ots-9 
(Banks etal. 1999): Ots-201b, -208b. -211, -212, -213 
(Greig et al. 2003): OtsG474 Williamson et al. (2002); 
and Ssa408 Cairney et al. 2000 

2 HMSC16 (from Banks & Jacobson 2004) included: 
Ots-104, -107 (Nelson & Beticham 1999); Ots-201b, 
-208b, -209 -211, -212, -215 (Greig etal. 2003): Ots- 
G78b, -G83b. -G249, -G253, -G311, -G422, -G409 
Williamson etal. (2002); and Ost515 (Naish & Park 
2002). 

3 HMSC21 included: the above 16 loci as well as an 
additional live microsatellites derived from research 



characterizing alternate copies of the circadian rhythm 
transcription factor cryptochrome: Cry2b.l, Crij2b.2, 
Cnj3 (O'Malley etal. 2010), Ots-701 (GenBank 
Accession no. KF163438) and Ots-702 (GenBank 
Accession no. KF163440). 

Alternate alleles were resolved through electrophoresis 
utilizing an Applied Biosystems (ABI) 3730x1 DNA 
analyzer tmd scored using ABI genemapper software 
(Version 4). 

Standardization of the HMSC baseline with the 
Abernathy Fish Technology Center 

The same standardization methods developed by the GAPS 
group (Seeb etal. 2007) were employed to standardize 
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Table 1 Collection data for California's Central Valley Chinook baseline populations from breeding stocks separated by run timing and location. 
Hatfield Marine Science Center (HMSC) baselines are characterized at 16 and 21 microsatellite loci respectively; GAPS13 (from Genetic Analysis of 
Pacific Salmon Consortium) is a different baseline collection characterized at 13 microsatellite loci. 



Run 


nfvi j\_ 1 c 


; ariu n/vijv_z i uabciiFics 






GAPS13 baseline 






Year 


Sampling location 


Life stage 


n 


Year 


Sampling location 


Life stage 


n 


Winter 


1991 


Keswick & Red Bluff Dams 


Adult 


17 


1992-5 


Keswick & 


Adult 


56 














Red Bluff Dams 








1992 


Keswick Dam 


Adult 


29 


1997 


Keswick Dam 


Adult 


3 




1993 


Keswick & Red Bluff Dams 


Adult 


9 


1998 


Keswick Dam 


Adult 


17 




1994 


Keswick Dam 


Adult 


24 


2001 


Keswick Dam 


Adult 


35 




1995 


Keswick Dam 


Adult 


25 


2003 


Keswick Dam 


Adult 


10 




1998 


Keswick Dam 


Adult 


87 


2004 


Keswick Dam 


Adult 


15 




Total 






191 








136 


Spring 


1994 


Butte Creek 


Spawned carcass 


50 


2002 


Butte Creek 


Adult 


61 


Butte 


1996 


Butte Creek 


Spawned carcass 


12 


2003 


Butte Creek 


Adult 


83 


Creek 


1997 


Butte Creek 


Spawned carcass 


60 












1998 


Butte Creek 


Spawned carcass 


62 












Total 






184 








144 


Spring 


1994 


Deer Creek 


Juvenile 


12 


2002 


Deer Creek 


Adult 


53 


Deer & 


1995 


Deer Creek 


Spawned carcass 


13 


2002 


Mill Creek 


Adult 


71 


Mill 


1995 


Mill Creek 


Spawned carcass 


10 


2003 


Mill Creek 


Adult 


20 


Creek 


1996 


Deer Creek 


Juvenile 


68 












1996 


Mill Creek 


Juvenile 


12 












1997 


Deer Creek 


Spawned carcass 


38 












1998 


Deer Creek 


Spawned carcass 


26 












1 QQS3 


Mill Creek 


Spawned carcass 


o 












Total 






185 








144 


Fall 


1995 


Nimbus Hatchery 


Adult 


75 


2002 


Battle Creek 


Adult 


67 




1995 


Mokelumne Hatchery 


Adult 


67 


2003 


Battle Creek 


Adult 


77 




1995 


Merced Hatchery 


Adult 


48 


2003 


Feather Hatchery 


Adult 


144 












2002 


Stanislaus River 


Adult 


76 












2002 


Tuolumne River 


Adult 


68 




Total 






190 








432 


Late-fall 


1993 


Keswick Dam & Battle Creek 


Adult 


72 




Not sampled 








1995 


Coleman National Fish Hatchery 


Adult 


90 












1995 


Keswick Dam 


Adult 


24 












Total 






186 











amplification, electrophoresis, allele nomenclature and 
scoring methods achieved between HMSC and the Aber- 
nathy Fish Technology Center (AFTC) laboratories. Briefly, 
this exercise involved sharing and evaluating three inde- 
pendent and coded 96-well plates containing Chinook 
salmon DNA samples: 

1 Bin-definition pkite 1 was passed from HMSC to AFTC 
along with genotype data. AFTC amplified and analyzed 
these samples in their labortitory using an ABI 3130 DNA 
Sequencer to enable AFTC allele bin calibration and 
scoring with HMSC allele nomenclature. 

2 Test plate 1 /bin-definition plate 2 was passed from 
HMSC to AFTC but without any genotype data. AFTC 
analyzed these samples and reported results back HMSC 
to assess standardization. 

3 Test plate 2 /bin-definition plate 3 was passed from 
HMSC to AFSC without genotype data. AFTC analyzed 
these samples and reported results to HMSC for 
final assessment of standardization among laborato- 
ries. 



Assignment and statistical analysis 

Given that numbers of fall and late-fall migrants substan- 
tially exceed those from winter and spring runs in most 
scenarios in the lower reaches of the Sacramento River or 
the NW Pacific Ocean, simulations performed to test for 
precision and accuracy were designed to approximate these 
relative abundtmce differences. This was achieved through 
utilizing the 'realistic fishery' option within the statistical 
package oncor (Kalinowski 2008; www.montana.edu/ 
kalinowski/Software/ONCOR.htm). Note that this technique 
utilizes a cross-validtition over a gene copies method 
demonstrated to be less prone to providing over-optimistic 
estimates of assignment power than earlier methods 
(Anderson utal. 2008; Anderson 2010). For HMSC base- 
lines, parameters were set to construct 1000 hypothetical 
mixtures of size 100 individuals each, using a 0.97 fraction 
for fall-late-fall reporting group and a 0.01 fraction each for 
the winter and spring from Butte Creek and the spring from 
Deer and Mill Creeks reporting groups. For the GAPS 13 
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baseline, parameters were set to construct 1000 hypothet- 
iccil mixtures of size 100 individuals each, using a 0.2475 
fraction for Btittle Creek fall, 0.2375 for Butte Creek fall. 
0.2375 for Fetither River Hatchery fall tmd 0.23 75 for 
Stanislaus River fall. The GAPS 13 simulation therefore had 
the same total 0.97 fraction for the fall-run reporting 
group, 0.01 for the Butte Creek spring, 0.01 for the Deer 
Creek spring, 0.00 for the Feather River Hatchery spring 
and 0.01 for the winter reporting groups. Complete 
multilocus data for blind-test samples were required with 
the exception of up to a maximum of three missing loci for 
all three microsatellite panels. Run identities were assessed 
utilizing oncor's 'assign individual to baseline population' 
option, and each individual was assigned to the reporting 
group for which it had the greatest probability (no 
probability cutoff was applied). Lower and upper 95% 
confidence intervals for realistic results from simulation 
studies were calculated using standard methods (P ± 1.96 
* standard error; Sokal & Rohlf 1995). We cross-tabulated 
the counts of the 750 blind-test samples correctly (true) 
versus incorrectly (false) identified by each possible pair of 
panels, separately for each run. Because both panels of 
each pair were identifying the same set of samples, their 
correct identification proportions were not independent. 
Thus, we used an exact version of McNemar's test (Agresti 
2002; Zar 2010) for each pair of panels to test for the 
equality of those proportions. 

Results 

standardization results indicate the AFTC and the HMSC 
allele scores averaged 97% identical for test plate one and 
98% correct for test pltite two (Table 2). One locus. 
Ots-208h, consistently scored less than the 90% identity 



Table 2 Percentage agreement in allele scoring between Abernathy 
Fish Technology Center and Hatfield Marine Science Center (HMSC) 
for microsatellite panel HMSC16. 



Locus 


Test plate 1 


Test plate 2 


Ots-104 


95.9 


99.4 


Oh-107 


100 


98.8 


Ots-201b 


98.8 


99.4 


Ots-208b 


88.3 


87.7 


Ots-209 


97.7 


97.1 


Ost-211 


96 


100 


Ots-212 


99.4 


98.9 


Ots-215 


100 


100 


Ots-249 


99.4 


97.8 


Ots-253b 


92.5 


98.9 


Ots-575 


92.3 


94.8 


Ots-G3T1 


99.2 


99.3 


Ots-G409 


94.9 


99.4 


Ost-C422 


100 


100 


Ost-G78B 


94.4 


100 


Ots-G83B 


100 


99.4 


Average 


96.8 


98.2 



threshold identified by the GAPS Consortium (Seeb et al. 
2007). Concordance between laboratories for the remaining 
loci was at least 90%, indicating that these loci had been 
successfully standardized. 

Realistic fishery simulation results indicated strong correct 
identity assignment potential (largely in the 90th percen- 
tiles) for each of the three microsatellite panels (Table 3 and 
Fig. 2). Consistent ranking among the three panels also was 
apparent from simulation results with correct assignment 
parameters ranging from 70 through 100% (GAPS13), 90% 
through 100% (HMSC16) and 96 through 100% (HMSC21). 
Non-overlapping 95% confidence intervals reinforce findings 
that (i) spring from Butte Creek correct assignments was 
higher for HMSC 16 and HMSC21 compared with GAPS13; 
(ii) spring from Deer and Mill Creeks assignments increased 
according to ranking for GAPS13, HMSC 16 and HMSC21; 

Table 3 Summary percentage correct results of realistic fishery 
simulations assessed at each of the three baselines for populations: 
W, winter; SB, spring from Butte Creek; SDM, spring from Deer and 
Mill Creeks; F-LF, fall and late-fall. 



GAPS 



HMSC16 



HMSC21 



W 
SB 

SMD 
F-LF 
Ave 



100 

87.2 (83.6, 90.9) 
69.7 (66.3, 73.2) 
99.2 (99.1, 99.3) 
89 



100 

98.4 (97.1, 99.8) 
89.9 (86.6, 93.3) 
97.9 (97.8, 98.1) 
96.6 



100 

99.1 (98.1, 100.1) 
95.8 (93.5, 98.0) 

99.2 (99.1, 99.3) 
98.5 



GAPS, Genetic Analysis of Pacific Salmon Consortium; HMSC, Hatfield 
Marine Science Center. 
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Figure 2 Blind-test (n = 623) and simulation correct assignment results 
(n = 1000 for winter and spring reporting groups) among California 
Central Valley Chinook salmon calculated using oncor (Kalinowski 
2008) and assessed using three different microsatellite panels. Bars on 
simulations indicate 95% confidence intervals. Chinook salmon runs are 
indicated as follows: F&LF, pooled fall and late-fall runs; SB, spring from 
Butte Creek; SMD spring from Mill and Deer Creeks; W, winter. 
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Table 4 Summary results of percentage correct assignment for each 
baseline from blind-test samples (Blind) and simulations (Sims) for 
populations: W, winter; SB, spring from Butte Creek; SDM, spring from 
Deer and Mill Creeks; F-LF, fall and late-fall. 





GAPS 




HMSC16 




HMSC21 




Blind 


Sims 


Blind 


Sims 


Blind 


Sims 


w 


92.61 


100.0 


95.45 


100.00 


95.45 


100.00 


SB 


76.92 


87.24 


92.31 


98.46 


92.31 


99.09 


SMD 


50.00 


69.75 


50.00 


89.92 


50.00 


95.76 


F-LF 


99.72 


93.80 


97.45 


97.94 


99.07 


99.24 


Ave 


79.81 


87.70 


83.80 


96.58 


84.21 


98.52 



GAPS, Genetic Analysis of Pacific Salmon Consortium; HMSC, Hatfield 
Marine Science Center 



and (iii) HMSC 16 ranked lower than did GAPS 13 and 
HMSC21 for pooled fall and late-fall assignments. Finally, all 
run assignment averages for both HMSC16 and HMSC21 
were higher than for GAPS13. 

Blind test of actual power (inferred from 623 known ID 
samples) indicated that simulation results generally were 
upwardly biased but affirmed parallel relative rankings 
across runs and microsatellite panels (Fig. 2). Fewer of 
winter run, spring from Butte Creek and spring from Deer 
and Mill Creeks assignments were correct than predicted. 
Fall-run blind-test assignments matched simulation esti- 
mates most closely. 

Average realistic fishery simulation rankings of micro- 
satellite panels, HMSC21 best score of 98.5%, HMSC16 next 
best score of 96.6% and GAPS13 lowest score of 87.7%, 
were supported by blind-test assignment accuracy of 84.2% 
(HMSC21), 83.8% (HMSC16) and 79.8% (GAPS13) 
(Table 4). There is some evidence that HMSC16 and 
HMSC21 winter blind-test assignments were more often 
correct than were those of GAPS 13 (McNemar's test. 



P= 0.0625; Table 5). However, we found no differences in 
the classification success rates of the three panels for any of 
the other runs (spring from Butte Creek, fall and spring from 
Deer and Mill Creeks). In particular. HMSC 16 and HMSC21 
had identical classification success for all blind-test fish 
except those in the fall run (Table 5). Allele frequency data 
utilized in this study are available at OSU Scholars Archive 
(doi: 10.7267/N9KW5CXX). 

Discussion 

Noting that this study focused on discrimination among 
closely related Chinook salmon runs from the same primary 
watershed (that have lost 70% of their historic habitat for 
spatial segregation), a 98% overall correct assignment 
prediction from simulations and blind-test tifBrmation at 
84% correct is astonishing. Similarly, promising overall 
results have been obtained for Sockeye salmon (Beacham 
etal. 2005), cod (Glover et al. 2010), cow (Van de 
Goor etal. 2011), sheep (Niu etal. 2011) and cats (Kuru- 
shima etal. 2012). Indeed, HMSC21 blind-test correct 
assignment averages of 99% (fall), 95% (winter) and 92% 
(spring from Butte Creek) are especially encouraging given 
the importance of accurate identification for endangered 
winter tmd threatened spring run fife histories (NMFS 
2009). These ptirticular bUnd-test results were in close 
agreement with predictions for simulations [fall: 99% 
(blind) and 99% (simulations); winter: 95% (blind) and 
100% (simulations); spring from Butte Creek: 927o (bUnd) 
and 99% (simulations)] (Table 6). This general agreement 
also is very positive because previous simulation methods 
have suffered from upward bias in their assessment of most 
likely assignment power (Anderson 2010). 

The wide difference between simulation prediction (96%) 
and blind-test findings for spring run from Deer and Mill 



Table 5 Comparisons of microsatellite panels in their classification success for three true runs. T denotes an accurately classified fish, and F denotes an 
error. P-values are for McNemar's test of equality in the proportions accurately classified by two panels. Spring run from Deer and Mill Creeks not 
shown because all three panels had identical classification success. 

True run spring from Butte 

True run winter (n = 176) Creek (n = 13) True run fall (n = 432) 



H16-F H16-T P H16-F H16-T P H16-F H16-T P 



G13-F 8 5 G13-F 1 2 G13-F 1 1 

G13-T 0 163 0.0625 G13-T 0 10 0.5 G13-T 4 426 0.375 



H21-F H21-T P H21-F H21-T P H21-F H21-T P 



G13-F 8 5 G13-F 1 2 G13-F 1 1 

G13-T 0 163 0.0625 G13-T 0 10 0.5 G13-T 5 425 0.219 



H16-F H16-T P H16-F H16-T P H16-F H16-T P 



H21-F 8 0 H21-F 1 0 H21-F 4 1 

H21-T 0 168 1 H21-T 0 12 1 H21-T 2 425 1 



G13, Genetic Analysis of Pacific Salmon Consortium panel; H16, Hatfield Marine Science Center 16 microsatellite panel; H21, Hatfield Marine 
Science Center 21 microsatellite panel. 
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Table 6 Blind-test result for 623 Chinook salmon. Rows indicate actual known identity; columns indicate where they were assigned by three 
microsatellite panels: G, GAPS (Genetic Analysis of Pacific Salmon Consortium) or H, HMSC (Hatfield Marine Science Center). 



Spring from Butte Spring from Deer & Mill 

Winter (W) Creek (SB) Creeks (SDM) Fall (F) 

Total 

Run G13 H16 H21 GIB H16 H21 G13 H16 H21 G13 H16 H21 Actual 



W 163 168 168 2 1 1 0 1 1 11 6 6 176 

SB 0 0 0 10 12 12 1 0 0 2 1 1 13 

SDM 000000111 1112 

F-LF 1 1 1 1 2 1 0 2 4 430 427 426 432 

623 



W, winter; SB, spring from Butte Creek; SDM, spring from Deer and Mill Creeks; F-LF, fall and late-fall. 



Creeks (50%) for all three baselines, however, indicates that 
this upward bias for simulation methods has not been 
completely eradicated. There are only two samples of 
known spring Deer and/or Mill Creeks origin among the 
623 samples considered in the blind test. This small sample 
size tempts one to suggest that observed upward difference 
between simulation and blind-test findings likely results 
from chance. We suggest, however, that tests with similtirly 
small sample size scenarios are appropritite because 
threatened and endangered species by definition tire alwtiys 
scarce. Identification applications commonly occur in con- 
texts where endangered species are markedly outnumbered 
by their more abundant counterparts (such as large- 
number fall and late-fall Chinook salmon runs in the 
current case). Although the cross-validation methods 
introduced by Anderson et al. (2008) and 'realistic fishery' 
algorithms available in oncor (Kalinowski 2008) have 
begun to overcome the upward bias problem, results 
obtained here for spring run from Mill and Deer Creeks 
demonstrtite that shortfalls still exist in our ability to employ 
simulation methods to accurately predict most likely 
assignment power among closely related runs. An earlier 
iteration of data for this blind test had a total n= 532. These 
532 known-identity fish, however, happened to contain 
only one sample from Deer and Mill Creeks and 12 samples 
from Butte Creek spring runs, yet the three baselines 
correctly assigned all 13 of these spring samples to their 
known origin, except that GAPS(13) misassigned two of the 
12 springs from Butte Creek. Thus, 100% [and 83% for 
Butte Creek (GAPS 13)] correct blind-test results for both 
spring run subpopulations were in closer agreement with 
simulation predictions and did not show any upwtird 
bias. Given that both spring run subpopulations had few 
numbers of samples employed in the first blind-test 532 
samples that were low, we returned to the original 750 
blind-test sample to derive more data. This increased our 
total number (?i) to 62 3, but did not substantially 
increase the numbers of spring run in the blind test. 
These results underscore the importance of using data 
thcit are separate from those used to train a classification 
process in evtiluating the accuracy of that process 
(Anderson 2010). 



No samples from any late-fall run were included in the 
GAPS13 baseline; however, blind-test and simulation 
results for late-fall run in the HMSC baselines provided 
further information with regard to bias. The blind sample of 
623 had a total of 77 samples from late-fall run (data not 
shown). Simulcition tests predicted a 91% success rate for 
late-fall, yet the blind-test score was only 44% correct. This 
was not unexpected considering that fall and late-fall runs 
are the most closely related among all Central Valley 
population pairs (fall-late-fall pairwise Fst = 0.02 vs. aver- 
age Fst for all subpopulations = 0.08). Indeed, late-fall-run 
misassignments were largely to fall run. Note, however, 
that an jz = 77 for late-fall samples is no longer small, yet 
this run had the highest upward bias observed between 
simulation and blind-test results. In contrast, this upward 
bias of simulation prediction was not observed for fall run. 
Considering fall and late-fall runs separately, the n = 623 
blind test had 157 fall-run samples, of which 153 (97%) 
were correctly identified by HMSC21 in exact agreement 
with simulation prediction of 97%. 

Comparing results attained from different microsatellite 
panels, the overall increasing correct assignment ranking 
from GAPS13, HMSC16 to HMSC21 was in parallel with 
increasing number of loci, as observed in other studies 
(Bjornstad & Roed 2002; Bamshad et al. 2003; Tadano et al. 
2008). This is supported by consistent ranking results from 
simulation tests for each of the runs (except GAPSl 3, which 
switched to second place for combined fall-late-fall simula- 
tion assignments) and marginal McNemar support for the 
same blind-test 13-16-21 loci increasing tissignment rank- 
ing. However, despite consistent top performance for 
HMSC21, margins separating results were not sufficient to 
prove this statistically. Although HMSC16 and 21 panel 
performances are largely the same for the blind test, 
simulations indicate the increased value of additional loci 
for discrimination among fall and spring runs (Fig. 2). This 
and fall-late-fall discrimination remain areas of greatest 
challenge in addressing accuracy for individual-based 
population assignment among CaUfornia's Central Valley 
Chinook salmon. However, fall-run identification across all 
baselines and microsatellite panels (including both blind- 
test and simulation results) was high (average 98% correct). 
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This level of success is a first and likely has strong 
application potential. Regionally. California's Central Valley 
Chinook salmon returns have been disturbingly low in 
recent years. Precipitously low numbers of Central Valley 
fall-run Chinook salmon was the primary driving force for a 
complete ocean fishery closure for 2008 and 2009 (NMFS 
2009). This situation had significant negative economic 
consequences for the region and motivates continued 
efforts, such as the molecular and statistical methods 
covered here, to better quantify accuracy for individual- 
based population identity determination for improved 
mtmagement. monitoring and conservation. 
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